Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

prithivMLmods

posted an update 1 day ago

Post

2129

OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. 📖

⤷ Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
⤷ Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
⤷ Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
⤷ Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
⤷ Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
⤷ Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
⤷ Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
⤷ Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
⤷ Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
⤷ AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HF🤗 :
⤷ AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
⤷ Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
⤷ MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
⤷ Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn

2 replies

clem

posted an update 3 days ago

Post

3917

Today, we're unveiling two new open-source AI robots! HopeJR for $3,000 & Reachy Mini for $300 🤖🤖🤖

Let's go open-source AI robotics!

5 replies

ginipick

posted an update 1 day ago

Post

2553

🎨 AI Hairstyle Changer - Transform with 93 Styles! 💇‍♀️✨

🚀 Introduction
Experience 93 different hairstyles and 29 hair colors in real-time with your uploaded photo!
Transform your look instantly with this AI-powered Gradio web app.

✨ Key Features

📸 Simple 3 Steps
Upload Photo - Upload a front-facing photo
Select Style - Choose from 93 hairstyles
Pick Color - Click your desired color from 29 color palette options

💫 Diverse Hairstyles (93 types)

🎯 Short Cuts: Pixie Cut, Bob, Lob, Crew Cut, Undercut
🌊 Waves: Soft Waves, Hollywood Waves, Finger Waves
🎀 Braids: French Braid, Box Braids, Fishtail Braid, Cornrows
👑 Updos: Chignon, Messy Bun, Top Knot, French Twist
🌈 Special Styles: Space Buns, Dreadlocks, Mohawk, Beehive

🎨 Hair Color Palette (29 colors)

🤎 Natural Colors: Black, Browns, Blonde variations
❤️ Red Tones: Red, Auburn, Copper, Burgundy
💜 Fashion Colors: Blue, Purple, Pink, Green, Rose Gold
⚪ Cool Tones: Silver, Ash Blonde, Titanium

🌟 Key Advantages

⚡ Fast Processing: Get results in just 10-30 seconds
🎯 High Accuracy: Natural-looking transformations with AI technology
💎 Professional Quality: High-resolution output suitable for social media
🔄 Unlimited Trials: Try as many combinations as you want
📱 User-Friendly: Intuitive interface with visual color palette

💡 Perfect For

💈 Salon Consultations: Show clients potential new looks before cutting
🛍️ Personal Styling: Experiment before making a big change
🎭 Entertainment: Fun transformations for social media content
🎬 Creative Projects: Character design and visualization
👗 Fashion Industry: Match hairstyles with outfits and makeup
📸 Photography: Pre-visualization for photoshoots

LINK: ginipick/Change-Hair

4 replies

VirtualOasis

posted an update 1 day ago

Post

1493

Agent Mesh
Agent Mesh is an exciting framework where autonomous AI agents collaborate in a connected ecosystem, sharing information and dynamically tackling complex tasks. Think of it as a network of smart agents collaborating seamlessly to get things done!

Agents share tasks and data, boosting efficiency.
Scalability: Easily add new agents to handle bigger challenges.

ginipick

posted an update about 16 hours ago

Post

964

🎨 FLUX VIDEO Generation - All-in-One AI Image/Video/Audio Generator

🚀 Introduction
FLUX VIDEO Generation is an all-in-one AI creative tool that generates images, videos, and audio from text prompts, powered by NVIDIA H100 GPU for lightning-fast processing!

ginigen/Flux-VIDEO

✨ Key Features
1️⃣ Text → Image → Video 🖼️➡️🎬

Generate high-quality images from Korean/English prompts
Transform still images into natural motion videos
Multiple size presets (Instagram, YouTube, Facebook, etc.)
Demo: 1-4 seconds / Full version: up to 60 seconds

2️⃣ Image Aspect Ratio Change 🎭

Freely adjust image aspect ratios
Expand images with outpainting technology
5 alignment options (Center, Left, Right, Top, Bottom)
Real-time preview functionality

3️⃣ Video + Audio Generation 🎵

Add AI-generated audio to videos
Korean prompt support (auto-translation)
Context-aware sound generation
Powered by MMAudio technology

🛠️ Tech Stack

Image Generation: FLUX, Stable Diffusion XL
Video Generation: TeaCache optimization
Audio Generation: MMAudio (44kHz high-quality)
Outpainting: ControlNet Union
Infrastructure: NVIDIA H100 GPU for ultra-fast generation

💡 How to Use

Select your desired tab
Enter your prompt (Korean/English supported!)
Adjust settings
Click generate button

🎯 Use Cases

📱 Social media content creation
🎥 YouTube Shorts/Reels
📊 Presentation materials
🎨 Creative artwork
🎵 Background sound generation

1 reply

openfree

posted an update 3 days ago

Post

2314

🎙️ Voice Clone AI Podcast Generator: Create Emotionally Rich Podcasts with Your Own Voice!

🚀 Project Introduction
Hello! Today we're excited to introduce an AI-powered solo podcast generator that creates high-quality voice cloning with authentic emotional expression.
Transform any PDF document, web URL, or keyword into a professional podcast with just a few clicks! 📚➡️🎧

VIDraft/Voice-Clone-Podcast

✨ Key Features
1. 🎯 Multiple Input Methods

URL: Simply paste any blog or article link
PDF: Upload research papers or documents directly
Keyword: Enter a topic and AI searches for the latest information to create content

2. 🎭 Emotionally Expressive Voice Cloning
Powered by Chatterbox TTS:

🎤 Voice Cloning: Learn and replicate your unique voice perfectly
📢 Natural intonation and emotional expression
🌊 Customizable emotion intensity with Exaggeration control
⚡ Seamless handling of long texts with automatic chunking

3. 🤖 State-of-the-Art LLM Script Generation

Professional-grade English dialogue using Private-BitSix-Mistral
12 natural conversational exchanges
Real-time web search integration for up-to-date information
Fully editable generated scripts! ✏️

💡 Use Cases
📖 Educational Content

Transform complex research papers into easy-to-understand podcasts
Create English learning materials in your own voice

📰 News & Information

Convert international articles into engaging audio content
Produce global trend analysis podcasts

🎨 Creative Content

Tell stories in English with your own voice
Build your global personal brand with custom audio content

🛠️ Tech Stack
🧠 LLM: Llama CPP + Private-BitSix-Mistral
🗣️ TTS: Chatterbox (Voice Cloning & Emotional Expression)
🔍 Search: Brave Search API
📄 Document Processing: LangChain + PyPDF
🖥️ Interface: Gradio
🎉 What Makes Us Special

🎤 Voice Cloning: Perfect voice replication from just a short audio sample
😊 Emotion Contro 📏 Unlimited Length 🔄 Real-time Updates

1 reply

dhruv3006

posted an update 2 days ago

Post

1802

C/ua Cloud Containers : Computer Use Agents in the Cloud

First cloud platform built for Computer-Use Agents. Open-source backbone. Linux/Windows/macOS desktops in your browser. Works with OpenAI, Anthropic, or any LLM. Pay only for compute time.

Our beta users have deployed 1000s of agents over the past month. Available now in 3 tiers: Small (1 vCPU/4GB), Medium (2 vCPU/8GB), Large (8 vCPU/32GB). Windows & macOS coming soon.

Github : https://github.com/trycua/cua ( We are open source !)

Cloud Platform : https://www.trycua.com/blog/introducing-cua-cloud-containers

merve

posted an update 2 days ago

Post

1755

HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❤️‍🔥 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers 🔥

MonsterMMORPG

posted an update 3 days ago

Post

2641

VEO 3 FLOW Full Tutorial - How To Use VEO3 in FLOW Guide : https://youtu.be/AoEmQPU2gtg

Tutorial link : https://youtu.be/AoEmQPU2gtg

VEO 3 AI is rocking generative AI field right now. FLOW is the platform that lets you use VEO 3 with so many cool features. This is an official tutorial and guide made by Google team. I edited it slightly. I hope this be helpful.

FLOW : https://labs.google/flow/about

Veo 3 is Google DeepMind’s most advanced video generation model to date. It allows users to create high-quality, cinematic video clips from simple text prompts, making it one of the most powerful AI tools for video creation. What sets Veo 3 apart is its ability to generate videos with native audio. This means that along with stunning visuals, Veo 3 can produce synchronized dialogue, ambient sounds, and background music—all from a single prompt. For filmmakers, this is a significant leap forward, as it eliminates the need for separate audio generation or complex syncing processes. Veo 3 also excels in realism, accurately simulating real-world physics and ensuring precise lip-syncing for characters, making the generated content feel remarkably lifelike.

Introducing Flow: AI Filmmaking Made Seamless

While Veo 3 handles the heavy lifting of video and audio generation, Flow is the creative interface that brings it all together. Flow is Google’s new AI filmmaking tool, custom-designed to work with Veo 3, as well as Google’s other advanced models like Gemini (for natural language processing) and Imagen (for text-to-image generation). Flow is built to be intuitive, allowing filmmakers to describe their ideas in everyday language and see them transformed into cinematic scenes. It offers a suite of features that give creators unprecedented control over their projects, from camera movements to scene transitions, all while maintaining consistency across clips.

BFFree

posted an update 2 days ago

Post

1674

I am a shy artist. Primarily because I don't get motivation from sharing art publicly. I see so much new art daily online that once I begin thinking about where I fit in the mental fatigue becomes counter productive for me.

Recently I shared an album of hundreds of creations with a friend (and singular art fan) and he asked some questions that I felt were interesting enough to create this post on my process and what it teaches me vs what I am seeking.

Specifically I have learned to take ink drawings and create renderings that reveal my actual intention. My digital art goal is to recreate natural details into characters and landscapes that are imagined and deal with my affection for abstraction, deconstruction and humor.

My drawing goals are to be humorous and crafty about how things can be rendered just slightly incorrect to make the viewer see something familiar and recognizable even when its nonsense.

My process is using hysts/ControlNet-v1-1 with Lineart, 50 steps, 14 guidance scale and I give minimal descriptions that are often plain. Example "Really real old dog, plant, and another old dog, with an alligator turtle, posing for a photography portrait".

In the past few months I started taking the ControlNet render to multimodalart/flux-style-shaping and mashing up styles. Here I used a portrait of a Tortise and a dog laying next to each other on a reflective tile floor.

Last night, I took the Flux output and had it described using WillemVH/Image_To_Text_Description which was very accurate given the image.

I then fed the prompt back into Alpha-VLLM/Lumina-Image-2.0

The last step confirmed why I prefer using sketches to language. One, I am a visual artist therefore I have much better nuance with the drawings than with words. Two, my minds eye looks for the distorted. Three MOR FUN.

2 replies

Recently active users